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OF TEXT AND AUDIO DATA 



5 BACKGROUND OF THE INVENTION: 

1 . Field of the Invention 

The present invention relates to data processing and speech signal 

processing, and relates more particularly to the synchronization of text or images to 

10 speech. 

s y 2. Description of the Related Art 

_ s ;5 Literary works and the like have long been reproduced in differing 

=; " audio formats for users who may wish to review the work by listening to it rather 
35 than reading it. Audio books, for example, have been produced so that people may 
3 listen to "books on tape" while driving. Additionally, such audio books have been 
^ produced for use by the blind and the dyslexic. These audio books have typically 

been provided in analog cassette form or on compact-disc (CD-ROM) by having a 

human narrator in a recording studio read and record the text. In one way of 
20 making the recording, one narrator starts at the beginning of the book and 

continues in page number sequence and continues recording until the end is 

reached. 

There exists an open source data standard called the Digital Audio- 
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Based Information System, or DAISY, by the DAISY Consortium for the creation of 
digital "talking books". The DAISY standard provides a selectable table of contents 
by which a user may choose a section of material to be played back audibly. 

Some software packages have been introduced using the DAISY 
5 standard to produce an audio book. These allow for recordings of the audio data to 
form the book made by one or more narrators in segments in accordance with a 
pre-set format. Typical programs are LP STUDIO PLUS and LP STUDIO PRO 
developed by The Productivity Works and Labyrinth. These programs allow for the 
^ creation, management, editing and production of synchronized multi-media 
ofo materials including the synchronization of audio recordings to text and image data. 
Cn In a typical method of generating digital audio books using these prior 

l % art software systems, a playback control format is established, such as a 
?g navigational control center (NCC) format, based on sections of the source text to 
M be recorded. The NCC predefines sections of synchronizable text prior to recording 
Q5 corresponding audio portions. Audio data for one or more of the text sections is 
then recorded, in any order. Once the audio portion has been recorded for one or 
more text sections, a Synchronized Multimedia Integration Language (SMIL) file is 
generated. Each SMIL file may correspond to one or more of the text sections with 
the associated audio data of one or more audio portions. In the LP STUDIO 
20 products, a skeleton SMIL is generated before the audio portion is recorded and is 
filled in as the recording is made. Once corresponding audio data portions have 
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been recorded for all sections of the text source, the SMIL files are integrated 
according to the original definitions provided by the pre-established NCC and placed 
on a medium for audio playback by a user. 

For a typical audio recording of a commercial type talking book, a 
5 single narrator reads the entire text. Situations may exist in which it is not 
economically feasible to have one narrator do this and a group of volunteer 
narrators is to be used each to record audio data for one or more sections. This 
particularly is advantageous in charitable type work, such as in producing talking 
books for the blind or dyslexic. The plurality of narrators are to record the audio 
:l0 data in a rather unstructured manner. That is, one volunteer may have time only to 
£m record one segment, a volunteer works periodically, and several volunteer narrators 
! *f are available to record data at the same time. 

n For the above described prior art and other recording systems, once 

m the NCC file has been defined, it can not be altered without recreating an entirely 
Q5 new data structure. This limits the ability of the ordering sequence and other 
attributes of the text files once the NCC has been established. Accordingly, it 
would be preferable to introduce a more narrator-friendly method and system for 
generating digital audio books whereby the formatting of sections of text to be 
recorded can be freely changed without having to redefine the entire NCC structure. 
20 This is particularly useful in an application where a number of different narrators, 
usually volunteers, are to produce a talking book by reading different sections of 
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BRIEF SUMMARY OF THE INVENTION 

In order to address and solve certain of the foregoing shortcomings in 
the prior art of talking books, the present invention provides a method and system 
for synchronizing audio data for use in the production of a digital talking book. The 
method includes dividing stored data from a text source, such as a novel or a 
textbook, into a plurality of sections. The text sections can be a single page or 
specific group of pages, or chapter or a selected number of words of the original 
text source. Each text section is tagged so that it can be identified. 

A narrator accesses a text section that he/she is to record. The text 
section is displayed on a computer screen and the narrator reads the section out 
loud. The section that is read is recorded as an audio portion in digital format. 
The audio data portions may be recorded in any standardized format, such as a 
WAV or MP3 format. When recording, the narrator presses a "mark" button to 
indicate the start of each section of text to be synchronized (the synchronization 
points are based on the text markup). The time points at which the mark button is 
pressed is stored in a "time stamp data" (TSD) file. At any time, the TSD file can 
be converted (via the software of the system) to SMIL format. When all SMIL files 
for the text source have been generated, the NCC is built (via the software) from 
the data found in the SMIL and a Book Project Management (BPM) file. The BPM 
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identifies the text markup elements to be included in the NCC (e.g., certain kinds of 
headings). These elements are extracted from the source text and are linked to the 
corresponding SMIL files to be available to create a Daisy-compliant digital talking 
book. The final product of the group of SMIL files and NCC file is reproduced on a 
5 medium, such as on a CD ROM or other portable medium, for use by the person 
who wishes to hear the talking book. Such user places the CD in a computer or 
other player device, such as a digital talking book reader, and accesses the program 
in selected sections based on the NCC to hear the recorded text data. 

In summary, a plurality of audio portions, each corresponding to a text 
rtO section, are recorded by the same or different narrators. Since the text sections are 
5 tagged, the generation of the audio portions can take place in any desired 
! p sequence, random or ordered, and different narrators can generate different audio 
O portions. The narrators may record their audio data on the same computer at 
M different times or on separate computers at the same time or on networked 
;dl5 computers, either at different times or simultaneously. 

In accordance with a preferred embodiment of the present invention, a 
narrator may change the order of the first selection of text data and a second 
selection. The system will then re-sequence the first audio data and the second 
audio data for playback based on the new order of the text selections. Thus, the 
20 present invention provides an advantage in that audio data corresponding to text 
data on a computer may be recorded in any order by any number of people, and 
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later re-sequenced if the order of text sections or length of recorded audio portions 
is changed. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Further objects and advantages of the present invention will be more 
readily appreciated upon review of the detailed description of the preferred 
embodiments included below when taken in conjunction with the accompanying 
drawings in which: 

Fig. 1 depicts an exemplary computer system for use with the present 

invention; 

Fig. 2 is a schematic block diagram illustrating exemplary components 
of the computer system of Fig. 1 ; 

Figs. 3A-3C are block diagrams illustrating exemplary steps of a 
program according to one embodiment of the present invention for synchronizing 
audio and text data using the computer system of Fig. 1 ; and 

Figs. 4A-4D are illustrations of various exemplary screen displays of a 
computer program according to one embodiment of the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 

Referring now to FIGS. 1-4D, wherein similar components of the 
instant invention are referenced in like manner, a preferred apparatus for 
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synchronizing audio and text or image data, and accompanying methods for using 
the same, are disclosed. 

Depicted in Fig. 1 is an exemplary computer workstation 10 to be 
used by a narrator and for synchronizing audio and text data according to the 
present invention. Workstation 10 is to be used by a person who is to read 
(narrate) and record data from the text source, such as a book. Hereafter, such 
person who inputs audio data is referred to as a "narrator" as distinguished from 
the person who listens to the playback and is called a "user". There can be a 
plurality of the workstations at different locations to accommodate a plurality of 
narrators. 

Each workstation can operate on a stand alone basis or be linked as 
part of network. Each workstation preferably includes a computer and several 
external interface devices. The computer has the necessary processing 
components and power to perform its intended functions either on a stand-alone 
basis or to serve as an input to one or more other central computers or servers. 
The workstation includes a display 28a by which a narrator may view, inter alia, 
text information that is to be read and recorded. An audio input device 26a, such 
as a microphone, is provided to record an audio file corresponding to the text as 
read by the narrator. A command input device 26b, such as a keyboard, may be 
used by the narrator to input appropriate commands to the workstation 10 in 
accordance with the present invention. 
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In certain embodiments of the present invention as described below, it 
is desirable for workstation 1 0 to communicate with other computer workstations 
across a network. Accordingly, a network connection 26c may be used by 
workstation 10 to send text data to and receive further audio data, commands and 
the like from remote narrators connected to the network. Network connection 26c 
may be a local area network (LAN), a wide area network (WAN), an intranet 
connection or a connection to the Internet. 

It is preferred that text and audio data be combined in accordance with 
the present invention and distributed to users who may wish to view and hear the 
combined audio/text data. Accordingly, a portable media reader/writer 28b, such 
as a floppy disk drive or a compact disk reader/writer, may be included to save a 
synchronized audio/text work on a portable, computer-readable medium after it has 
been completed. 

Fig. 2 shows an example of the internal components of computer 
workstation 10 which may be necessary to implement the present invention. The 
primary component of computer workstation 10 are conventional and include a 
processor 20, which may be any commonly available microprocessor such as the 
PENTIUM III manufactured by INTEL CORP. The processor 20 is shown operatively 
connected to a memory 22 having RAM (random access memory) and ROM (read 
only memory) sections. There also is a clock 24, a memory 30 (which stores one 
or more operating system and application programs 32), and the input device(s) 26 
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and output device(s) 28. 

The processor 20 operates in conjunction with the RAM/ROM in a 
manner well known in the art. The random-access memory portion of RAM/ROM 
22 may be a suitable number of Single In-line Memory Module (SIMM) chips having 
5 a storage capacity (typically measured in megabytes) sufficient to store and 

transfer, inter alia, processing instructions utilized by the processor 20 which may 
be received from the application program 32. The read-only memory (ROM) portion 
of RAM/ROM 22 may be any permanent non-rewritable memory medium capable 
of storing and transferring, inter alia, processing instructions performed by the 
Sjo processor 20 during a start-up routine of the computer workstation 10. The ROM 
ffl can be on-board of the microprocessor. Further functions of RAM/ROM 22 will be 
j S apparent to those of ordinary skill in the art. 

JU The clock 24 typically is an on-board component of the processor 20 

; I which has a clock speed (typically measured in MHz) which controls the processor 
Ml 5 20 performs and synchronizes, inter alia, communication between hardware 

components of the computer workstation 10. Further functions of the clock 24 are 
known to one of ordinary skill in the art. 

The input device(s) 26 may be one or more commonly known devices 
used for receiving narrator inputs and/or audio data, and transmitting the same to 
20 the processor 20 for processing. Accordingly, the input device(s) 26 may include 
the microphone 26a, the keyboard 26b, the network connection 26c, a mouse, a 
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dedicated command button, a joystick, a voice recognition unit with appropriate 
software, and the like. Input device(s) 26 may further be any number of other 
devices and/or software which are operative to allow a narrator or operator to enter 
instructions, values, audio data and the like to computer workstation 10 in 
accordance with the present invention. 

Output device(s) 28 may be one or more commonly known devices 
used by the computer workstation 10 to communicate the audio and/or textual 
data, results of entered instructions, and the like to a narrator or operator of the 
computer workstation 10. Accordingly, the output device(s) 28 may include the 
display 28a, the portable-media writer 28b, a voice synthesizer, a speaker, a 
network connection 26c, and any other appropriate devices and/or software which 
are operative to present data or the like to the narrator or operator of the computer 
workstation 10. 

The memory 30 may be an internal or external large capacity device 
for storing computer processing instructions, audio information, video information 
and the like, the storage capacity of the device typically being measured in 
megabytes or gigabytes. Accordingly, the memory 30 may be one or more of the 
following: a floppy disk drive, a hard disk drive, a CD-ROM disk and reader/writer, a 
DVD disk and reader/writer, a ZIP disk and a ZIP drive of the type manufactured by 
IOMEGA CORP., and/or any other computer readable medium that may be encoded 
with processing instructions in a read-only or read-write format. Further functions 
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of, and available devices for, memory 30 will be apparent to one of ordinary skill in 
the art. 

The memory 30 stores, inter alia, a plurality of programs which may 
be any one or more of an operating system such as WINDOWS 98 by MICROSOFT 
5 CORP, and an application program 32 (described below), as well as audio, text and 
synchronization data 34 which is necessary to implement the embodiments of the 
present invention. 

Figs. 3A-3D show the steps implemented by the application program 
used for generating synchronized audio and text data in accordance with the 
;10 invention. The application program is loaded into the computer. The process 
CP begins at step 31 with the book or other complete text is selected as the source 
H for generating the audio book. The text from the book is scanned or manually 
entered to create a computer readable source text (step 32). The source text is 
stored in a computer-readable format, such as ASCII, HTML or XML on any desired 
Q5 media, such as the computer internal memory or hard disk or on a CD, to be used 
by the computer, etc. 

A Book Project Management (BPM) file is generated in step 34. The 
BPM contains bibliographical and identifying information about the book (e.g., title, 
ISBN); a list of the marked up source text files to use; the order of these files; a list 
20 of the markup elements in the source files to be used as synchronization points; 
and a list of the markup elements to be included as high-level navigation items in 

Text & Audio Data Synchronization 
8/9/00 



the NCC. The BPM file can be created by a person through the system software, 
or it can be created through an automated process outside of the system. 

The markup tags elements, or synchronization points are applied to the 
source text document (Step 36). These tags define the text sections to be 
recorded by the narrator(s). Then tags are to be time stamped to be available for 
association with the audio data portions corresponding to the text sections. As 
indicated, these markup tags are inserted, typically in a separate process, based on 
attributes of the text (i.e. each chapter heading), by word count, page count. In a 
typical application where there are to be a number of different volunteer narrators, 
each page of a text is marked as a section. Text in between two such markup tags 
are referred to herein as a text section. The markup tags are for display to the 
narrator to indicate the section as the reading/recording is performed. 

Once the book project management file has been created, the source 
text document can be displayed to a narrator a section at a time with a 
synchronization icon (visual indicator) inserted at each location corresponding to a 
markup tag. The displayed text is preferably in an HTML or XML format. In a 
typical application there is a list of unread sections of the text source for which 
audio information is to be supplied by the narrator. The list can contain data on the 
length of the sections. The volunteer narrator selects a section for where he/she 
wishes to record audio data. That section is then displayed. 

An audio portion corresponding to the text section selected is then 
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recorded by the narrator (step 38). The text sections can be selected for recording 
as audio portions in any order. The audio portion or portions selected to be read by 
a narrator are displayed on the display screen and read into the microphone 26a 
attached to the computer 10. In a preferred embodiment of the invention, the text 
5 section for which audio data is being inputted is automatically scrolled and 

highlighted on the display screen as it is read. This is done to relieve the monotony 
of the narrator. It is also preferred that the scroll rate be adjustable. 

The application program is preferably equipped with functions such as 

ri to allow recording levels to be established and altered. Further controllable 

cBo recording functions can be audio sampling rates, audio bit sizes, audio data storage 

! ^ formats, and the like. 

As the narrator begins to read, audio data corresponding to the text is 

□ recorded in digital format in the computer memory. The operator may activate a 
mark button 86, such as is displayed in Fig. 4A below. Activation of the mark 

9 5 button 86, instructs the computer that the following audio portion is to be recorded 
and synchronized to the displayed text. The mark button may be a virtual button 
on the display 28a of the workstation 10 or may be an assigned key on the 
keyboard 26a. Once the mark button 86 has been activated, the synchronization 
icon, such as icon 96a described with reference to Fig. 4D below, which 
20 corresponds to the displayed text section being recorded, may change appearance 
to reflect that the section is being recorded. 
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Once recording of the audio portion for the text section has been 
completed, the narrator may activate the mark button 86 again. A timestamp data 
file is generated by the application program to denote that the recorded audio data 
between the two activations of the mark button corresponds to the displayed text 
5 section. Further functions relating to the recording of audio data are provided 

below with respect to Fig. 3B. A recorded audio portion may then be edited or re- 
recorded if necessary (step 40), as described below with respect to Fig. 3C. 

The application program preferably generates an SMIL file upon 
:Sa . completion of each recording audio portion (step 42). Alternatively, a plurality of 
rHo SMIL files may be created once recording of all audio portions made by a volunteer 
ffl or group of volunteers have been completed. Each SMIL file contains data which 
j ^ synchronizes a recorded audio portion to the corresponding text section. After all 

of the SMIL files for the complete text have been created, they are assembled in 
M the appropriate sequence and stored in memory 30. A playback control file, such 
Q5 as an NCC file is then generated and stored (step 44). The NCC is described in the 
DAISY 2.0 specification (available at www.daisv.org). The NCC is an HTML file 
that contains the text of all points of high-level navigation (e.g., chapters, section 
headings, pages). It also contains links from these items to the appropriate points 
in the SMIL files that describe the overall text-audio synchronization. 
20 The audio data, text data, SMIL files and the NCC file are saved onto a 

portable medium for distribution to end users (step 46). Said users would be, for 
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example, a handicapped, blind or dyslexic person. The material on the medium is 
the complete original text source and the digitally recorded audio representation 
thereof. The portable medium can be any of a CD-ROM, a DVD-ROM, a floppy 
disk, a ZIP disk, and/or any other standard, portable, computer-readable medium by 
5 which such programs may be distributed to users. However, it is to be understood 
that the combined audio/text data may be saved and stored on an Internet web 
site, or other centrally-accessible network address such as a bulletin-board system 
(BBS). Users then may connect to the Internet or the network where they may 
^ access the combined audio/text data. 

iJjO Access is accomplished by the user placing the medium in a computer 

CP that has audio playback capabilities and preferably a display so that the NCC 
[S t program can be displayed. This gives a navigational map of the material on the 
}j medium. The user, or a person assisting him, then selects a section of the text to 

U be played back in an audio form. This can be done in any sequence and the user 
Q5 can start and stop the playback at any time. As the audio playback takes place, 

the text also can be displayed if desired. 

Fig. 3B shows a process 47 for recording or re-recording audio data. 

This can be done by the original narrator who inputted the audio portion or by a 

technician or another narrator. The person making the correction indicates that an 
20 audio file is to be opened (step 48). At step 50, it is determined whether a new 

audio file is to be created or whether an old file is to be re-recorded (step 50). If a 
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new file is to be created, recording preferences are received from the user in step 
52. Then recording begins in step 54. If an existing file is to be used, it is 
retrieved and the user begins recording data at step 54. 

In a preferred embodiment of the invention, if an audio portion is to be 
5 edited, it is easier to re-record the portion. If the audio needs to be re-recorded, the 
narrator may change the status of the corresponding text section(s) to indicate the 
same. The synchronization icon corresponding to the text section(s) that to have a 
re-recorded portion reverts from a "recorded" indication to an "unrecorded" 
^ indication. 

Fig. 3C shows an exemplary process 55 for editing recorded audio 
£P data. This can be done instead of re-recording the section. First, a previously 
[ yi recorded audio file is opened (step 56). If the audio recording has any undesirable 
n features, such as a long pause, high background noise, mispronunciations, and the 
H= like, the editing utility may be used to edit out the undesirable portions (step 58). 
y 5 If editing functions were performed, the changes may be saved to the 

memory 30 (step 60). Furthermore, it is preferred that an undo/redo queue be 
provided by the application program so that any changes may be tracked an 
undone, if necessary. 

Figs. 4A-4C show exemplary screen displays, 70, 90, 92 and 94 
20 generated by the application program. In Fig. 4A, there is depicted a login screen 
display 70 which is presented upon execution of the application program. The 
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display 70 includes one or more lines 71a for accepting narrator login data, 
including narrator identification and a password. A second set of lines 71b may 
optionally be provided to login a monitor or supervisor who is monitoring, for 
example, a recording session. The display 70 may contain further narrator-actuable 
buttons 72-86 which may be used during the recording session. A clock 88 may 
further be displayed which indicates recording time of a particular recording 
session. Buttons 72-86 and clock 88 may be displayed during the entire time that 
application 82 is running. 

Button 72-86 include a full rewind button 72, a rewind button 74, a 
play button 76, a fast-forward button 78, a fast-forward to end button 80, a stop 
button 82, a record button 84, and a mark button 86. Full rewind button 72 allows 
the narrator to instruct the computer 10 to return to the start of a recorded audio 
portion. Rewind button 74 allows the narrator to instruct the computer 10 to 
return to a part of a recorded audio portion. Play button allows a narrator to 
instruct the computer 10 to play a part of a recorded audio portion from a current 
position of the audio portion. Fast forward button 78 allows a narrator to instruct 
the computer 1 0 to move to a later portion of the recorded audio portion. Fast 
forward to end button 80 allows a narrator to instruct the computer 10 to move 
forward to the end of a recorded audio portion. Stop button 82 allows a narrator to 
instruct the computer 10 to stop a playback or recording of an audio portion. 
Record button 84 allows the narrator to instruct the computer 10 to record from a 
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current point in the audio portion. The function of mark button 86 was described 
above with respect to Fig. 3A. Each of buttons 72-86 may be disabled until the 
narrator displays a section of text to be recorded. 

Fig. 4B shows a second screen display 90 wherein a dialog box 91 is 
5 presented after a narrator logs in. Dialog box 91 may display all current book 
projects for which audio data is being stored. A narrator may select one of the 
book projects before proceeding to record an audio section. 

Referring now to Fig. 4C, after a book project has been selected, a 
third screen display 92 is displayed which includes a second dialog box 93. The 
So dialog box 93 includes a list of text files for which audio data is to be recorded. The 
Cm narrator may then select a particular text file. 

^ Referring now to Fig. 4D, after a text file has been selected, a fourth 

:U screen display 94 is presented. The display 94 may show text sections and 
M; accompanying synchronization icons 96a-96e which indicate whether audio data 
Q5 has been recorded for the corresponding text section. A narrator selects a text 

section to record, or the application program may designate which text section is to 

be recorded. Once the text has been selected, buttons 72-86 may be activated for 

the narrators use. 

In a preferred embodiment of the foregoing system, the combined 
20 audio/text program that is finally created may present the entire stored text file to a 

user and may play back the audio files which correspond to the displayed text in 
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order for the displayed text. 

Although the invention has been described in detail in the foregoing 
embodiments, it is to be understood that they have been provided for purposes of 
illustration only and that other variations both in form and detail can be made 
therein by those skilled in the art without departing from the spirit and scope of the 
invention, which is defined solely by the appended claims. 
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We claim: 



1 1 . A method for synchronizing audio data to text data from a 

2 source, the method comprising: 

3 segregating a source text data of a given format into a plurality of text 

4 sections; 

5 recording in any order audio data portions each corresponding to a text 

6 section; 

_ r J converting each recorded audio portion into an audio data file; 

^8 assembling the audio data files in a sequence corresponding the given 

CFS9 format of the source text data; and 

r |0 generating for said assembled data files a playback control file 

41 indicating points of navigation of the source text data. 

□ 1 2. The method of claim 1 wherein said playback control file 

2 contains links to corresponding points of navigation of said audio data files. 

1 3. The method of claim 1 wherein a said text section correspond 

2 to any one of a chapter, page or word count of the source text data. 

1 4. The method of claim 1 wherein a narrator reads a selected text 
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2 section to form an audio portion and produces a mark of the beginning and end of 

3 the text section recorded as an audio portion to mark the points for converting to 

4 an audio data file. 

1 5. The method of claim 1 further comprising the steps of 

2 placing the assembled audio data files and playback control file on a 

3 medium for audio playback of said audio data files and display of said playback 

4 control file. 

|S 1 6. The method of claim 5 wherein said playback control file 

in 2 contains links to corresponding points of navigation of said audio data files. 

L l 7. The method of claim 5 further comprising the steps of 

j!I 2 a user selecting a text section to be played back by selecting the 

□ 3 section from the display; and 

4 audio reproduction of the selected text section. 

1 8. The method of claim 7 wherein the source data text is placed on 

2 the medium and the text is displayed upon selection from the playback control file. 

1 9. The method of claim 1 wherein the audio portions are recorded 
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2 by one or more narrators in any sequence and at any time or times. 

1 10. The method of claim 9, wherein the audio data is produced by 

2 at least two narrators located at the same or separate computer terminals. 

1 11. The method of claim 10, wherein the computer terminals are 

2 linked to a network of at least one of a local area network, a wide area network, an 

3 intranet, and an Internet connection. 

M g l 1 2. The method of claim 1 , wherein the text data is stored in at 

i;P2 least one of an HTML format and a XML format. 

; U i 1 3. The method of claim 1 , further comprising: 

i"1 2 storing the playback control data, the audio data and the text data on 

Q3 a portable computer-readable medium. 

1 14. The method of claim 12, wherein the portable computer- 

2 readable medium comprises at least one of a CD-ROM, A DVD-ROM, a ZIP disk, 

3 and a floppy disk. 

1 1 5. Apparatus for synchronizing audio data to a plurality of text data 
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2 sections of a source text, comprising: 

3 a computer including means for recording in any order audio data 

4 portions each corresponding to a text section; 

5 means for converting each recorded audio portion into an audio data 

6 file; 

7 means for assembling the audio data files in a sequence corresponding 

8 the given format of the source text data; and 

9 means for generating for said assembled data files a playback control 
40 file indicating points of navigation of the source text data. 

oil 16. Apparatus as in claim 15 wherein said playback control file 

! Jf2 contains links to corresponding points of navigation of said audio data files. 

l-^ 1 1 7. Apparatus as in claim 1 5 further comprising: 

□ 2 means for receiving a medium having recorded thereon said audio files 

3 and said playback control file; 

4 means for displaying said playback control file; and 

5 means for playing back in audio form the data of an audio file selected 

6 from the displayed playback control file. 
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1 18. Apparatus as in claim 17 wherein said text data sections are 

2 recorded on said medium and said display means displays the text section 

3 corresponding to the selected audio portion. 
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ABSTRACT 



A method and apparatus for synchronizing audio data to text data in 
which the text is manually or automatically divided into an arbitrary number of 
sections and a text section is selected for display and presented to a narrator for 
audio input. A human narrator reads the displayed selected section of text into a 
computer to form an audio data portion for the selected section that is linked to the 
text by entering synchronization marks to the computer. The sections of text may 
be selected for reading and audio input in any order and by any number of 
narrators. Each recorded audio portion is converted to an audio data file and the 
audio data files are re-sequenced to conform to the original order of the text based 
on a playback control file. A plurality of narrators may contribute audio data from 
separate workstations of a network, or workstations which are connected to a 
central server over a communications network, such as the Internet. After the 
sections of text and accompanying audio have been properly sequenced and 
assembled into a single audio/text program, it is stored on a portable computer- 
readable medium (e.g. a CD-ROM) and a user may select from the playback control 
file any portion of the text for view and/or select any portion of viewed text and 
hear the audio portion corresponding thereto. 
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"Arme was a petite, elegant woman who, through her tenacious efforts, helped to make equal the educational opportunities for people who were 
visually impaired The goals she set were attained by her quite dedicauon and grow and flower m her memory " 
Amanda 8c Jack Casties, RPB&D board members ementi 

"It really ls overwhelming to think of how many people's lives have been changed and, I hope, how many more will be changed because of my 
grandmother's efforts, which have now led the efforts of so many other people " 

Cammy Thomas, Arme T Macdonald' s granddaughter, RPB&D volunteer and Boston unit board member 



Ha Tribute To RFB&D's Founder, Anne T. Macdonald 



!^,^X>n a bright May mommg m 1988. Yale University conferred honorary degrees on nine distinguished men and women At 93. Anne T Macdonald was 
both the oldest and smallest of the nme. who included actor Paul Newman and Princeton University President Harold Shapiro. But hers had been a towering 
achievement, for she had brought into being an organization that, as prize-winning poet, playwright and education Archibald MacLeish put it, "changed the 
whole shape of the future for the young bfcnd of America * 

&.^wTorty years prior, Anne T Macdonald, the wife of a Wall Street financier, had joined the Women's Auxiliary of the New York Public Library As a Red 
Cross volunteer and former Army nurse, she had worked with wounded veterans and was attracted to the auxiliary's program of recording textbooks for bind 
veterans attending college under the GI Bill of Pdghts Ironically, her voice lacked the clarity to record text, so she directed her talents into leadership Thanks to 
her spirit and tenacity, a small auxiliary "program" grew into RFB&D Yale's honorary degree was not the first award for Macdonald In 1971, the American 

advance library services to the blmd. 

m 
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AND POWER OF ATTORNEY 
Original Application 

As one of the below named inventors, we declare that the information given herein 
is true, that we believe that we are the original, first and sole inventor if only one 
name is listed at 1 below, or a joint inventor if plural inventors are named below, of 
the invention entitled: 

METHOD AND APPARATUS FOR 
SYNCHRONIZATION OF TEXT AND AUDIO DATA 

which is described and claimed in: 

[x] the attached specification or [] the specification in application 

Serial No. , filed 

(for declaration not accompanying appl.) 

that we do not know and do not believe that the same was ever known or used in the 
United States of America before my or our invention thereof or patented or described 
in any printed publication in any country before my or our invention thereof, or more 
than one year prior to this application, or in public use or on sale in the United States 
of America more than one year prior to this application, that the invention has not 
been patented or made the subject of an inventor's certificate issued before the date 
of this application in any country foreign to the United States of America on an 
application filed by me or my legal representatives or assigns more than twelve 
months prior to this application, that I acknowledge my duty to disclose information 
of which I am aware which is material to patentability in accordance with 37 CFR 
§1.56. I hereby state that I have reviewed and understand the contents of the 
above-identified specification, including the claims, as amended by any amendment 
referred to above. I hereby claim the priority benefits under 35 U.S.C. 1 1 9 of any 
application(s) for patent or inventor's certificate listed below. All foreign applications 
for patent or inventor's certificate on this invention filed by me or my legal 
representatives or assigns prior to the application(s) of which priority is claimed are 
also identified below. 
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COUNTRY 
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POWER OF ATTORNEY: 

As a named inventor, I hereby appoint the following attorney(s) and/or agents(s) to prosecute this application and transact all 
business in the Patent and Trademark office connected therewith: Gordon D. Coplein #19,1 65, Wifliam F. Dudine, Jr. #20,569, 
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805 Third Avenue 

New York, NY 1 0022 ' 21 2-527-7700 
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